Efficient lα Distance Approximation for High Dimensional Data Using α-Stable Projection
نویسندگان
چکیده
In recent years, large high-dimensional data sets have become commonplace in a wide range of applications in science and commerce. Techniques for dimension reduction are of primary concern in statistical analysis. Projection methods play an important role. We investigate the use of projection algorithms that exploit properties of the α-stable distributions. We show that lα distances and quasi-distances can be recovered from random projections with full statistical efficiency by L-estimation. The computational requirements of our algorithm are modest; after a once-and-for-all calculation to determine an array of length k, the algorithm runs in O(k) time for each distance, where k is the reduced dimension of the projection.
منابع مشابه
Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections
The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...
متن کاملUsing Stable Random Projections
Abstract Many tasks (e.g., clustering) in machine learning only require the lα distances instead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been ...
متن کاملMax-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Let f : {1, 2, . . . , N} → [0,∞) be a non–negative signal, defined over a very large domain and suppose that we want to be able to address approximate aggregate queries or point queries about f . To answer queries about f , we introduce a new type of random sketches calledmax–stable sketches. The (ideal precision) max–stable sketch of f , Ej(f), 1 ≤ j ≤ K, is defined as: Ej(f) := max 1≤i≤N f(i...
متن کاملSign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels
The method of stable random projections is popular for efficiently computing the lα distances in high dimension (where 0 < α ≤ 2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data an...
متن کاملEstimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections
Abstract The method of stable random projections is popular in data stream computations, data mining, information retrieval, and machine learning, for efficiently computing the lα (0 < α ≤ 2) distances using a small (memory) space, in one pass of the data. We propose algorithms based on (1) the geometric mean estimator, for all 0 < α ≤ 2, and (2) the harmonic mean estimator, only for small α (e...
متن کامل